4 research outputs found
ΠΠΎΡΡΠΎΠ»ΠΎΠ³ΠΈΡ: Π·Π°Π΄Π°ΡΠΈ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΡΠ΅Π·Π°ΡΡΡΡΠ° ΠΈ ΡΠΏΠ΅ΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΡΠΈΡ ΠΎΠ²ΠΎΠ³ΠΎ ΡΠ΅ΠΊΡΡΠ°
It is a brief version of the report made by the authors at the seminar βModeling and Analysis of Information Systemsβ, Yaroslavl, May 17, 2017. The interrelation of problems of computeraided constructing the thesaurus and specification of verse texts in poetology is considered.Β ΠΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΎ ΠΊΡΠ°ΡΠΊΠΎΠ΅ ΠΈΠ·Π»ΠΎΠΆΠ΅Π½ΠΈΠ΅ Π΄ΠΎΠΊΠ»Π°Π΄Π° Π°Π²ΡΠΎΡΠΎΠ² Π½Π° ΡΠ΅ΠΌΠΈΠ½Π°ΡΠ΅ βΠΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΠΈ Π°Π½Π°Π»ΠΈΠ· ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΡΡ
ΡΠΈΡΡΠ΅ΠΌβ, Π―ΡΠΎΡΠ»Π°Π²Π»Ρ, 17 ΠΌΠ°Ρ 2017 Π³. Π Π½ΡΠΌ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΡΡΡ Π²Π·Π°ΠΈΠΌΠΎΡΠ²ΡΠ·Ρ Π·Π°Π΄Π°Ρ ΠΏΠΎ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·Π°ΡΠΈΠΈ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΡΠ΅Π·Π°ΡΡΡΡΠ° ΠΈ ΡΠΏΠ΅ΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ΅ΠΊΡΡΠ° ΡΡΠΈΡ
ΠΎΡΠ²ΠΎΡΠ½ΠΎΠ³ΠΎ ΠΏΡΠΎΠΈΠ·Π²Π΅Π΄Π΅Π½ΠΈΡ Π² ΠΏΠΎΡΡΠΎΠ»ΠΎΠ³ΠΈΠΈ.
ΠΠ΅ΠΊΡΠΎΡΠ½ΠΎΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ ΡΠ»ΠΎΠ² Ρ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΠΌΠΈ ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΡΠΌΠΈ: ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Π»ΡΠ½ΡΠ΅ Π½Π°Π±Π»ΡΠ΄Π΅Π½ΠΈΡ
The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.ΠΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡΡ ΠΈΠ΄Π΅Π½ΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±Π»ΠΈΠ·ΠΎΡΡΠΈ ΠΌΠ΅ΠΆΠ΄Ρ ΡΠ»ΠΎΠ²Π°ΠΌΠΈ ΡΠ΄Π΅Π»Π°Π»Π° ΠΌΠΎΠ΄Π΅Π»Ρ word2vec ΡΠΈΡΠΎΠΊΠΎ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΠΎΠΉ Π² NLP-Π·Π°Π΄Π°ΡΠ°Ρ
. ΠΠ΄Π΅Ρ word2vec ΠΎΡΠ½ΠΎΠ²Π°Π½Π° Π½Π° ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΠ½ΠΎΠΉ Π±Π»ΠΈΠ·ΠΎΡΡΠΈ ΡΠ»ΠΎΠ². ΠΠ°ΠΆΠ΄ΠΎΠ΅ ΡΠ»ΠΎΠ²ΠΎ ΠΌΠΎΠΆΠ΅Ρ Π±ΡΡΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΎ Π² Π²ΠΈΠ΄Π΅ Π²Π΅ΠΊΡΠΎΡΠ°, Π±Π»ΠΈΠ·ΠΊΠΈΠ΅ ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°ΡΡ Π²Π΅ΠΊΡΠΎΡΠΎΠ² ΠΌΠΎΠ³ΡΡ Π±ΡΡΡ ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠΈΡΠΎΠ²Π°Π½Ρ ΠΊΠ°ΠΊ Π±Π»ΠΈΠ·ΠΊΠΈΠ΅ ΠΏΠΎ ΡΠΌΡΡΠ»Ρ ΡΠ»ΠΎΠ²Π°. Π’Π°ΠΊΠΈΠΌ ΠΎΠ±ΡΠ°Π·ΠΎΠΌ, ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΉ (ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠ΅ ΡΠΈΠ½ΠΎΠ½ΠΈΠΌΠΈΠΈ, ΡΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²ΡΠ΅ ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΡ ΠΈ Π΄ΡΡΠ³ΠΈΠ΅) ΠΌΠΎΠΆΠ΅Ρ Π±ΡΡΡ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·ΠΈΡΠΎΠ²Π°Π½ΠΎ. Π£ΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½ΠΈΠ΅ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΉ Π²ΡΡΡΠ½ΡΡ ΡΡΠΈΡΠ°Π΅ΡΡΡ ΡΡΡΠ΄ΠΎΠ΅ΠΌΠΊΠΎΠΉ ΠΈ Π½Π΅ΠΎΠ±ΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠΉ Π·Π°Π΄Π°ΡΠ΅ΠΉ, ΡΡΠ΅Π±ΡΡΡΠ΅ΠΉ Π±ΠΎΠ»ΡΡΠΎΠ³ΠΎ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²Π° Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΈ ΠΏΡΠΈΠ²Π»Π΅ΡΠ΅Π½ΠΈΡ ΡΠΊΡΠΏΠ΅ΡΡΠΎΠ². ΠΠΎ ΡΡΠ΅Π΄ΠΈ Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΠ²Π½ΡΡ
ΡΠ»ΠΎΠ², ΡΡΠΎΡΠΌΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΌΠΎΠ΄Π΅Π»ΠΈ word2vec, Π²ΡΡΡΠ΅ΡΠ°ΡΡΡΡ ΡΠ»ΠΎΠ²Π°, Π½Π΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΡΡΠΈΠ΅ Π½ΠΈΠΊΠ°ΠΊΠΈΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΉ Ρ Π³Π»Π°Π²Π½ΡΠΌ ΡΠ»ΠΎΠ²ΠΎΠΌ, Π΄Π»Ρ ΠΊΠΎΡΠΎΡΠΎΠ³ΠΎ Π±ΡΠ» ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΠ²Π½ΡΠΉ ΡΡΠ΄. Π ΡΠ°Π±ΠΎΡΠ΅ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°ΡΡΡΡ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡΠ΅Π»ΡΠ½ΡΠ΅ ΠΊΡΠΈΡΠ΅ΡΠΈΠΈ, ΠΊΠΎΡΠΎΡΡΠ΅ ΠΌΠΎΠ³ΡΡ Π±ΡΡΡ ΠΏΡΠΈΠΌΠ΅Π½ΠΈΠΌΡ Π΄Π»Ρ ΡΠ΅ΡΠ΅Π½ΠΈΡ Π΄Π°Π½Π½ΠΎΠΉ ΠΏΡΠΎΠ±Π»Π΅ΠΌΡ. ΠΠ°Π±Π»ΡΠ΄Π΅Π½ΠΈΡ ΠΈ ΠΏΡΠΎΠ²Π΅Π΄Π΅Π½Π½ΡΠ΅ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΡ Ρ ΠΎΠ±ΡΠ΅ΠΈΠ·Π²Π΅ΡΡΠ½ΡΠΌΠΈ Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠ°ΠΌΠΈ, ΡΠ°ΠΊΠΈΠΌΠΈ ΠΊΠ°ΠΊ ΡΠ°ΡΡΠΎΡΠ° ΡΠ»ΠΎΠ², ΠΏΠΎΠ·ΠΈΡΠΈΡ Π² Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΠ²Π½ΠΎΠΌ ΡΡΠ΄Ρ, ΠΌΠΎΠ³ΡΡ Π±ΡΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½Ρ Π΄Π»Ρ ΡΠ»ΡΡΡΠ΅Π½ΠΈΡ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ² ΠΏΡΠΈ ΡΠ°Π±ΠΎΡΠ΅ Ρ Π²Π΅ΠΊΡΠΎΡΠ½ΡΠΌ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ΠΌ ΡΠ»ΠΎΠ² Π² ΡΠ°ΡΡΠΈ ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΉ Π΄Π»Ρ ΡΡΡΡΠΊΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠ°. Π ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Ρ
ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ ΠΎΠ±ΡΡΠ΅Π½Π½Π°Ρ Π½Π° ΠΊΠΎΡΠΏΡΡΠ°Ρ
Π€Π»ΠΈΠ±ΡΡΡΡ ΠΌΠΎΠ΄Π΅Π»Ρ word2vec ΠΈ ΡΠ°Π·ΠΌΠ΅ΡΠ΅Π½Π½ΡΠ΅ Π΄Π°Π½Π½ΡΠ΅ ΠΠΈΠΊΠΈΡΠ»ΠΎΠ²Π°ΡΡ Π² ΠΊΠ°ΡΠ΅ΡΡΠ²Π΅ ΠΎΠ±ΡΠ°Π·ΡΠΎΠ²ΡΡ
ΠΏΡΠΈΠΌΠ΅ΡΠΎΠ², Π² ΠΊΠΎΡΠΎΡΡΡ
ΠΎΡΡΠ°ΠΆΠ΅Π½Ρ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΡ. Π‘Π΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈ ΡΠ²ΡΠ·Π°Π½Π½ΡΠ΅ ΡΠ»ΠΎΠ²Π° (ΠΈΠ»ΠΈ ΡΠ΅ΡΠΌΠΈΠ½Ρ) Π½Π°ΡΠ»ΠΈ ΡΠ²ΠΎΠ΅ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² ΡΠ΅Π·Π°ΡΡΡΡΠ°Ρ
, ΠΎΠ½ΡΠΎΠ»ΠΎΠ³ΠΈΡΡ
, ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡΠ°Π»ΡΠ½ΡΡ
ΡΠΈΡΡΠ΅ΠΌΠ°Ρ
Π΄Π»Ρ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π΅ΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠ°
Poetology: Problems of Constructing a Thesaurus and Verse Text Specification
It is a brief version of the report made by the authors at the seminar βModeling and Analysis of Information Systemsβ, Yaroslavl, May 17, 2017. The interrelation of problems of computeraided constructing the thesaurus and specification of verse texts in poetology is considered
Word Embedding for Semantically Relative Words: an Experimental Study
The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing